In this activity, we will learn and practice some of the useful syntax and functionalities in Python that will help you to go through activities for FIT5202 throughout the semester. Python is a popular, powerful open-source programming language for building data science applications.
Throughout the semester, we will use the Jupyter notebook that provides an interactive environment for writing and running code (e.g. Python and R). Each notebook is associated with a single kernel. As we will be using Python, this notebook is associated with the IPython kernel. If you run your Python code, make sure that you have chosen the "Python 3" kernel on Jupyter notebook (see the upper-right corner of this notebook for this checking); and also, you need to add the code using "Code" cells, not "Markdown" cells. Markdown is a popular markup language that is a superset of HTML.
Python is an interpreted programming language. Thus, you can simply write commands directly into the interpreter and execute them. We will be using Python 3. The following is the plan of our tasks to be done this week:
Let's get started!
What you need to remember:
We can import a module using command import
. For example, if we want to import a module (numpy), then we can use the following code: import numpy as np
. Here, we can use "as
" to remame the module.
Let's first print a string, "Hello World!". You need to use the print statement with parenthesis:
print("Hello World!")
Execute this command in a new cell (i.e. add a new cell below).
In Python, we can define variables without a type. Run the code below into the next cell:
number = 3.14
message = "this is a string message"
print (message, 3.14 - 3.14/10)
Have you understood what the above formula is producing now?
Note that we can the "#" character to add a comment to the Python script. The rest of the line after "#" is ignored. Run each of the following lines and see the result:
# query1 = this is a comment
print(query1)
question2="# is this a not comment"
print(query2)
We can use simply Python code for calculation. You can type an expression and immediately after the Python interpreter will write the result. Run each line of the code below:
1+2
10 - 2*3
10/5
5/2
Note division in Python always returns a float. To get an interger result disregarding any fractional number, you can use the "//" operator. Run each line of the code:
11/3
11//3
To calculate the remainder, use "%". To calulator powers, use "**". Run each line of the code:
11%3
2**5
Python also provides very useful built-in functions for arithmetic operations. Please see more details about it: https://docs.python.org/3.8/library/functions.html#round
The following explains the comparison operators with their usage. A comparison operator is used to compare one operand to another, and returns either True or False.
Run each line of the following code and see the result:
1 = 2
1 != 2
1 < 2
1 <= 2
1 > 2
1 >= 2
We will learn about basic structure in Python. In particular, we will look into the following:
Python easily handles strings. They can be enclosed in single quotes ('...') or double quotes ("...") with the same result. The character "\" can be used to escape quotes.
Run the code below:
"hello world"
"\"Yes\" he said"
""Yes" he said"
String literals can span multiple lines by using triple-quotes: """...""" or '''...'''
print("""hello
world""")
Strings can be cancatenated using "+". Run the code:
"Py" + "thon"
Also, strings can be repeated using "*". For example, print "a" three times, run the code:
3 * "a"
We can also index strings. The first character has index 0. Run the code:
word = "python"
word[0]
word[5]
Also, we can slice a string, which can be used to obtain substring. Run the code:
word[0:2]
How does it look like? When slicing, make sure that the first parameter is the index included, and the second parameter is the index excluded.
If we omit the first index, the default is 0. If we omit the second index, the default is the size of thes string. Run each line of the code:
word[:2]
word[3:]
word[-2:]
Note that the last example looks odd. It actually extracts the characters from the second-last to the end.
One of the compount data types is the list. In a list, a list of comma-separated items between brackets can be cotained. Run the code:
myList = [1, 3, 5, 7]
myList
Lists also can be indexed and sliced as strings. Run each line of the code:
myList[0]
myList[-1] # the last element
myList[1:]
myList[:]
Also, we can apply concaternation on the lists. Run the code:
myList = myList + [9, 11]
Do you want to change an item in the list? Run the code:
myList[1] = 4
Check the elements of myList.
We can also add new items at the end of the list using append()
method. Run the code:
myList.append(13)
The length of the list can be obtained using a built-in function: len()
. Print the length of myList!
One of the powerful functions is that we can create nested lists in a list. Run the code:
A = ['a', 'b', 'c']
B = [1, 2, 3]
C = [A, B]
C
C[0]
C[0][1]
Let's see how to delete an item in a list. In the above A, if we want to delete an item 'a', run the code:
A.remove('a')
Alternatively, we can delete an item using its index. Run the code:
del A[0]
Also, we can get the index of an item. Run the code:
A = ['a', 'b', 'c']
A.index('b')
A set is an unordered list. And duplicate elements are not allowed to be contained in a set. Using a set, we can easily apply set operations such as union, intersection, and difference. Curly braces or the set()
method can be used to create sets. Run the code:
A = {'a', 'b', 'c'}
A
A = {'a', 'b', 'c', 'c'}
A
Check the membership of an item in the set. Run the code:
'a' in A
Now, let's how to use the set operations. Run the code:
A = {'a', 'b', 'c'}
B = {'a', 'd'}
A.union(B) (or A|B)
A.intersection(B) (or A & B)
A.difference(B) (or A - B)
Now let's see a difference between union
and update
. The update
method changes the set in place, while the union
leaves the original set alone, and returns a copy instead. Run the code:
AA = {'a', 'b'}
BB = {'c'}
AA.update(BB)
print(AA)
AA = {'a', 'b'}
BB = {'c'}
AA.union(BB)
print(AA)
Let's add an item to a set using the add()
method. Run the code:
A.add('d')
A
Another useful data structure is the dictionary. In a dictionary, items are indexed by keys. Key can be any immutable type such as string and numbers. Simply, a dictionary is an unordered set of key-value pairs where keys should be unique. A pair braces creates an empty dictionary (i.e. {}).
The following examples show different ways to construct a dictionary. To know more about them, run the code:
contact = {}
contact['a'] = 1
contact
contact2 = {'a':1, 'b':2}
contact2
contact3 = dict(a=1, b=2)
contact3
Let's retrive the all keys. Run the code:
keys = contact2.keys()
keys
Also, if we want to create a list consisting of the key of a dictionary, run the following:
list(contact2.keys())
If you want to check whether an item (e.g. 'a') is the dictionary or not, run the code:
'a' in contact2
Numpy os the core library for scientific computing in Python. It provides a high-performance multi-dimensional array object. However, in this unit, we will be using the NumPy library a little bit later on (in Week 5). For more information about NumPy, please refer to the following: https://docs.scipy.org/doc/numpy/reference/.
To use this library, let's first import it:
import numpy as np
We can create a numpy 1-dim array as:
a = np.array([1,2,3]) # 1-dim array
print (type(a)) # check the type of the array
print (a.shape)
print (a[0], a[1], a[2])
print(a)
Let's create a 2-dim array:
b = np.array([[1,2,3],[4,5,6]])
print(b.shape)
print(b[0,0])
Numpy also provides many functions to cretae arrays and manipulate array indexes.
For example, to create a 2x2 array of all zeros, run the code:
c = np.zeros((2,2))
print(a)
Basic math functions can be used as follows:
x = np.array([[1,2], [3,4]])
y = np.array([[5,6], [7,8]])
print (x+y) # elementwise sum producing an array
print(np.add(x,y)) # what about this?
print (x-y) # elementwise differences producing an array
print(np.subtract(x,y)) # what about this?
print (x*y) # elementwise product producing an array
print(np.multiply(x,y)) # what about this?
print (x/y) # elementwise sum producing an array
print(np.divide(x,y)) # what about this?
print(np.sqrt(x)) # elementwise square root producing an array
Here are some examples of a useful function that performs computations on arrays. It's sum()
. Run the code:
x = np.array([[1,2], [3,4]])
print(np.sum(x)) # sum of the all elements
print(np.sum(x, axis=0)) # sum of each column
print(np.sum(x, axis=1)) # sum of each row
Python is a powerful object-oriented programming (OOP) language. In OOP, we try to create reusable patterns of code. One important concept in OOP is the distinction between classes and objects:
We can define a class by using the class
keyword. Similarly, we can define its functions by using the def
keyword. Let's create a class using the following code:
class myClass:
name = "A"
def myFunc (self):
return "hello world!"
In the myClass
class, the function myFunc
is called a method as well. A method is often called to be a special function defined within a class. Note that the argument is self
that is a reference to objects that are made based on this class. To reference instances (or objects) of the class, self
will always be the first parameter.
Defining this class did not create any myClass
objects. Instead, we need to create an object using the class. Now we create an object that is an instance of the myClass
class:
x = myClass()
Then, we can use its method(s) using the dot operator:
x.myFunc()
The constructor method is used to initialise data in a class. It is run as soon as an object of a class is instantiated. Also known as the __init__
method, it will be the first definition of a class and looks like this:
def __init__(self):
print("This is the constructor method.")
Add this function in the class and run the code:
x = myClass()
x.myFunc()
Can you see how the construct works out?
Now, let’s create a method that uses a variable, name
that we will add. We will assign it a a value in a method. Our new class will be like that:
class myClass:
name = "" # we will use this variable
def __init__(self):
print("This is the constructor method.")
def assignName(self, name):
self.name = name
def myFunc (self):
return "hi " + self.name + ": hello world!"
Then, run the following code:
x = myClass()
x.assignName("YB")
x.myFunc()
We learned how to create classes, instantiate objects, initialise attributes with the constructor method, and working with more than one object of the same class. OOP is an important concept when reusing code more straightforward and effectively, as objects created for one program can be used in another.
A for
loop can iterate over a sequence of numbers with the "range
" function. The range
function returns a new list with numbers of that specified range. Note that the range function is zero based. Run the following examples to understand how the for
loop is iterating using range
.
for x in range(5):
print(x)
for x in range(6, 10):
print(x)
Also, we can design a for
loop statement using the range
and len
built-in functions. For example, the above example of printing the prime numbers can be also written in this way:
primes = [2, 3, 5, 7]
for index in range(len(primes)):
print(primes[index])
Run the above code.
A while
loop repeats as long as a certain condition is met. For example:
n = 0
while n < 5:
print(n)
n += 1
Run the above code.
If necessary, we can use break
exit a for
loop or a while
loop. On the other hand, continue
can be used to skip the current block. For example:
n = 0
while True:
print(n)
n += 1
if n >= 5:
break
The above example is the same with the following code using continue
:
n = 0
while True:
print(n)
n += 1
if n >= 5:
break
else:
continue
Run the above two pieces of the code.
We've already learned how to use the dictionary function in the above activity. dict()
creates a new data dictionary. There are several ways to create a dictionary. If no arguments are given, an empty dictionary is created. Also, we learned that we can use dict()
with a tuple or list as its argument.
It'd be the best to think that a dictionary can be seen as a two-element (key, value) pairs.
We can use enumerate()
to iterate an iterable object. It returns an enumerate object. More specifically, such an object can bee seen a list of tuples, each containing a pair of count/index and value. So this function is veru useful for using both index and value of each value in a list.
To learn, what this function returns and how to use it, run the following code:
menu = ['pizza', 'pasta', 'hamburger']
print(menu)
print(list(menu))
print(list(enumerate(menu)))
Using the for
loop, we can iterate the enumerate object. Look at and run the code:
for index, item in enumerate(menu):
print (index, item)
If you want to change the start index, for example, starting with 1, use the code below:
for index, item in enumerate(menu, 1):
print (index, item)
We've already looked into how to use len()
. It returns the length (i.e. the number of items) of an object.
max()
returns the maximum value in a given list while min()
return the minimum. Run the code:
a = [1, 3, 5]
print(max(a))
print (min(a))
We've already looked at how to use this function range()
with a for
loop. It actually generates a list of numbers used to iterate over with a for
loop.
The range()
function has two sets of parameters. Let's look at how this function can be used with each case.
range(stop)
: stop
is an integer number and it returns numbers starting from 0
to stop
excluded. That is, range(3) = [0, 1, 2]
. Run the code:
for i in range(5):
print(i)
range([start], stop, [step])
: start
: starting number of the sequence, stop
: the same with the above, step
: indicating the interval between each number in the sequence. Note that start
and step
are optional parameters. Run the code:
for i in range(1, 5): # start, stop
print(i)
for i in range(1, 5, 2): # start, stop, step
print(i)
Using the sorted()
function, we can easily sort a list in ascending order. Run the code:
a = [3, 1, 7, 5, 9]
sorted(a)
If you want to generate a descending sorted list, use this paramter (reverse=True
) in the function:
sorted(a, reverse=True)
We can also use the list.sort()
function. Run the code:
a.sort()
a
a.sort(reverse=True)
a
Now, let's run how to sort a dictionary in Python. We've learned that the dict()
object is a useful container that can store a collection of key-value pairs. An an example, look at the following dictionary: myDic = {'a':1, 'b':3, 'c':2}
In the myDic object, the keys are a
, b
, and c
, while the values are 1
, 2
, and 3
. By calling the list
method on it, we can easily retrieve the keys. Run the code:
myDic = {'a':1, 'b':3, 'c':2}
list(myDic)
But as you see the result of the list, the items are not sorted. If we want to order the dictionary object by their keys, we can use sorted()
. Run the code:
print(sorted(myDic))
print(sorted(list(myDic)))
The above two results should be same. Can you identify why?
On the other hand, if we want to order the dictionary object by their values, we can use the following example:
print(sorted(myDic.values()))
Finally, if we want to iterate the sorted dictionary by the keys, then see the following example:
for key, value in sorted(myDic.items()):
print(key, value)
This zip()
function returns an iterator that aggregates elements from each of the iterables. For example, if we want to aggregate elements from two lists, we can use this method. Look at and run the following example:
A = [1, 2, 3]
B = ['a', 'b', 'c']
C = zip(A, B)
print (list(C))
If the length of the iterables are not equal, zip
creates the list of tuples of length equal to the smallest iterable. For example:
A = [1, 2, 3]
B = ['a', 'b']
C = zip(A, B)
print (list(C))
What's the result? The length of C
should be 2.
Do you want to unzip a list of tutples? Don't worry! We can do it:
C = list(zip(A,B))
newA, newB = zip(*C)
print (newA, newB)
Lambda functions are anonymous functions (i.e. functions that are not bound to a name) at runtime, and we can create these functions the keyword "lambda"
.
The following code shows the difference between a normal function ("f"
) using def
and a lambda function ("g"
):
def f(x):
return 2**x
print (f(3))
g = lambda x: 2**x
print (g(3))
Run the above code.
Can you identify differences between f()
and g()
? As you can see, both functions do exactly the same and can be used in the same ways. However, note that g()
does not include a "return"
statement. Also you can put a lambda definition anywhere a function is expected, and you don't need to assign it to a variablev at all.
Let's see more examples about using a lambda function. Run the code: The following takes this a step further.
A = [2, 18, 9, 22, 17]
print (list(filter(lambda x: x % 3 == 0, A)))
Referring to the above, we used a built-in function, filter()
, and defined a lamba function to do a specific thing as an argument of the function filter()
. Of course, we can define a normal function using def
and then use it as an argument to filter()
, if we're going to use it several times, or if the function is too complex for coding.
However, if we need it only once and it looks simple, it could be better to use a lambda function. This creates very compact, yet readable code. Run the above code and check the result.
Multi-processsing is a core part of parallel programming. Given a job, in parallel programming, multiple processors are performed separately to complete the job. The job is split into the number of sub tasks and each processor is responsible for carrying out each sub task using a separate memory.
The multi-processing functions in the Python's standard library has powerful features. If you wanto read about all tips and details, please refer to the following: https://docs.python.org/dev/library/multiprocessing.html.
In the following, we provide a brief overview of using the Pool
approach that we will use throughout the semester for parallel data processing. The Pool
class is used to represent a pool of worker processes. It has methods allowing us to offload a given job to the worker processes.
There are two methods that are particularly interesting:
- Pool.apply()
- Pool.map()
- Pool.apply_async()
Let's get started!
First, we need to import the Python multiprocessing moddule:
import multiprocessing as mp
Second, we need create a Pool object by defining the number of multiple parallel precessors that will perform together for parallel processing at the same time. That is, we create an instance of Pool and tell it to create n_processor
worker processes.
pool = mp.Pool(processes = n_processor),
Third, we call the pool.apply()
method to perform funtionName
in parallel.
pool.apply(functionName, [argument_1, ..., argument_n]),
where functionName
is the function that to be performed in parallel, and argument_1, ..., argument_n
are arguments of the function functionName
.
To explain, let's look at the example and run it:
import multiprocessing as mp
def cube(x):
return x**3
pool = mp.Pool(processes = 2)
results = [pool.apply(cube, [x]) for x in range (1,5)]
print(results)
The above example shows how to calculate cube numbers using two parallel processors. Each pool.apply
is performed by one of the two processors. The Pool.apply
will lock the main program until all processes are finished. It's useful if we want to obtain results in a particular order for a given application.
We can also use the pool.map()
method to map a function and an iterable to each process. In the above example, "results = [pool.apply(cube, [x]) for x in range (1,5)]"
can be written as following using pool.map()
:
results = pool.map(cube, range(1,5))
Run the code above!
Contrast to pool.apply()
or pool.map()
, the pool.apply_async()
method will submit all processes at once and retrieve the results as soon as they are finished. So, we need to use the get()
method after calling apply_async()
to obtain the results. Let's look at and run the following example:
results = [pool.apply_async(cube, [x]) for x in range (1,5)]
output = [p.get() for p in results]
print(output)
Congratulations on finishing this activity!
Having practiced today's activities, we're now ready to embark on a trip of the rest of exiciting FIT5202 activities! See you next week!